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Abstract 

We obtain, using transfer matrix methods, the distribution function P{R) of the end-to-end 
distance, the loop formation probabiUty and force-extension relations in a model for short double- 
stranded DNA molecules. Accounting for the appearance of "bubbles", localized regions of en- 
hanced flexibility associated with the opening of a few base pairs of double-stranded DNA in 
thermal equilibrium, leads to dramatic changes in P{R) and unusual force-extension curves. An 
analytic formula for the loop formation probability in the presence of bubbles is proposed. For 
short heterogeneous chains, we demonstrate a strong dependence of loop formation probabilities on 
sequence, as seen in recent experiments. 
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The physics of bending and loop formation in DNA is key to a variety of regulatory 
processes within the cell. Loop formation in DNA is believed to be central to enhancer 
action, while the compactness of DNA packaging within the nucleosome necessitates the 
bending of DNA over length scales of a few tens of base pairs . In a broader context, 
the modelling of bending and looping in biopolymers at length scales over which intrinsic 
stiffness plays a dominant role is a problem of general relevance. 

Worm-like chain (WLC) models of DNA elasticity incorporate semi-flexibility, describing 
the chain in terms of a persistence length Ip, the length-scale at which tangent vectors to 
the polymer are decorrelated ^. On scales smaller than Ip, bending energy dominates and 
the chain is relatively stiff. It has conventionally been assumed that the relatively large 
energy required to bend short double stranded (ds) DNA of length L ~ 100 base-pairs (bp) 
into loops necessitates the intervention of DNA-binding proteins. Recent DNA cyclization 
experiments of Cloutier and Widom fCW) which study relatively small, isolated dsDNA 
sequences question this assumption In the CW experiments, DNA molecules that are 
94 bp in length, comparable to sharply looped DNAs in vivo, spontaneously cyclize with a 
large probability. Theories of DNA elasticity based on the homogeneous WLC model pre- 
dict cyclization probabilities three to four orders of magnitude smaller than those obtained 
experimentally [5|, indicating, in the words of Cloutier and Widom, "a need for new theories 
of DNA bending" ^. 

The problem posed by these experiments has stimulated much recent work on modelling 
loop formation in short DNA moleculesj^, 0|. An attractive explanation for this discrepancy 



is the existence of "bubbles" , localized regions of 
few base pairs of dsDNA in thermal equilibrium 



arge flexibility induced by the opening of a 



(Alternatively, it has been argued that 
non-linear elastic effects relevant at high curvature might induce a kinking transition 
Bubbles (or kinks) are argued to greatly increase the flexibility of the WLC in their vicinity, 
thereby enhancing the cyclization probability 0, Recent transfer-matrix based calcula- 
tions implement this idea, but restrict themselves to the computation of the cyclization 
probability^]. 

In this Letter we report the first calculation of the distribution function of the end-to-end 
distance P{R) in a model for short dsDNA fragments with equilibrium bubbles. We propose 
an analytic formula describing the loop formation probability density P(0) in such fragments 
and compare this formula with results from numerical calculations. Varying the chemical 
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potential for bubbles leads to behaviour which interpolates between fully flexible, semiflexible 
and rigid rod limits, resulting in a variety of non-trivial force-extension curves. Simple 
extensions of our model which simulate DNA heterogeneity indicate that such heterogeneities 
can be important determinants of the cyclization probability in small chains. 

We use the well-known connection of the WLC model, with hamiltonian Hwlc_ = 
k/2 ds {dt{s) / ds)'^ constrained by |t^(s)| = 1, to the Heisenberg spin model [ol, |lO|. 
Here t(s) = dr{s)/ds is the unit tangent vector and k is the bending stiffness. The persis- 
tence length is Ip = with /? = l/fc^T. Mapping the continuum model to the discrete one 
requires that a minimum coarse-graining length scale be specified. We fix this to be 6 = 1 
nm (3 bp's). (This also represents the scale at which the smallest bubbles appear.) Thus, 
a 150 bp chain is represented by an L=50 site spin model. In what follows, both L and Ip 
are dimensionless numbers representing the physical chain length and persistence length in 
units of this basic scale. The bending stiffness and the coupling constant in the spin model 
are related through J = n/b. In the WLC model, the distribution function of the end-to-end 
vector -P(R) characterizes the conformations of the polymer; it depends only on the ratio 
L/lp. Rotational invariance imposes -P(R) = P(|R|) = -P(-R). 

The energetics of ds DNA in the presence of bubbles in equilibrium is modelled via the 
following hamiltonian ^ 

N-l N-l 
i=l i=l 

Here Tj is a local variable which specifies whether a given bond forms part of a bubble or not: 
Tj = 1 if the site i is a part of the bubble, otherwise = 0. A chemical potential fi controls 
the energetics of the variable and hence the number of bubbles^]. The bending stiffness 
of double-stranded and bubble regions map to coupling constants Jd and Js respectively. 
The distribution function P(R) of the end-to-end vector R = J2iLi is 

/„ 1 1 N-l 

dt,... dt^J^- E ^'^""^(Yl U-R), (2) 

ri=0 rjv_i=0 i 

where Af fixes J dHP{R) = 1. With one end of the polymer at R = 0, the 
probability p{z) for the other to be in a given 2-plane is related to -P(R) through 
p{z) = f dTlP(R)S(R^ — z), where R = RiCx + R2(^y + -RsCz. Defining a generat- 
ing function p[f) ^] through p[f) = J_j^dz exp{fz)p{z), yields, upon substituting 



definitions of p{z) and P{R), the relation p{f) = Z{f)/Z{f = 0), wliere Z{f) = 
J dti. . . J dtN Zlri=o •■• Z1tjv_i=o i-f^H + f Ylf^dl))- After performing the n summa- 
tion, Z(f) can be computed using transfer matrices jl^]- Once Z{f) is known, p(f) and p{z) 
can be computed. P{R) is then finally obtained, exploiting the tomographic methods out- 
lined in Ref. jll| and symmetry arguments, from P{R) = {—l/2T{z)dp{z)/dz\z=R- We can 
either allow the tangent vectors at the two ends to fluctuate independently, corresponding to 
free boundary conditions on the tangent vectors, or require that they be equal, a boundary 
condition believed to be appropriate to the CW experiments. 

We benchmark our methods by computing P{R) for a homogeneous semifiexible polymer 
in the absence of bubbles, {i.e. ji = oo), at various Ljlp values. Our approach yields results 
which are fully consistent with previous work, reproducing known answers from simulations 
and analytic work 13, ^|. In addition, for L/lp = 3.85, we recover the double hump 
feature in the distribution function reported recently W^e have also computed the 



loop formation probability density P{0) (defined as P{R = 0)) for the entire range of L/lp 
values, obtaining results which agree well with the Shimada-Yamakawa formulap]. 
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FIG. 1: (a) P{R) calculated for a 150 bp DNA fragment with pja = 50 and pjs = 1 for = 6, 

(ii) = 7 and (iii) Pfj, = 9. The curve (iv) is P{R) from the WLC calculations (/3/i = oo). (b) 
P(0) from the transfer matrix calculation (points) compared to the predictions of the aproximate 
formula of Eq. ^ (lines). The curves from top to bottom are for (3 fx = 10, 12, 15, 18 and oo. 



To incorporate bubble formation, we fix i3Jd = 50 for the double stranded region and 
f]Js = 1 for the bubble region, consistent with measures of the persistence length 16|. Our 
results for L = 50 are plotted in Fig. Wi^) for PfJ' = 6, 7 and 9. (The free energy cost in units 
of temperature jSu to form a 3bp bubble can be estimated to lie between 6 and 15 depending 
on the sequence j6[). For Pfx = 6 the distribution function peaks at i? = 0. Remarkably, for 
Pfi = 9, the distribution function alters completely, with the peak shifting to near R = L. 



For intermediate values of fi, the peak of P{R) interpolates between R = and R c:^ L. 
For L/lp = 1, P{R) exhibits a double hump feature when /Sfi = 6.9, as described earlier for 
semiflexible chains in the regime in which L/lp ~ 3.85. 

The loop formation probability density P(0) is experimentally accessible. Fig.^b) shows 
P(0) (symbols) at different values of /i, as L is varied. The boundary condition imposed 
allows the tangent vectors at the chain ends to fluctuate independently. The lines through 
the data points are predictions of the analytic theory described below. At large fi, the 
results asymptote to those obtained for /i — > oo. As /i is decreased from infinity, bubbles are 
favoured and -P(O) increases sharply at small L. 

We have specialized our calculations to allow for the insertion of a single bubble at 
arbitrary points along the chain. This allows us to check for the optimal bubble location. 
We compute P{R) for the case in which a single bubble is placed at the centre {L/2) as well 
as at positions which deviate from the central position by one and two sites on both sides, as 
shown in Fig. |2l At the parameter values P Jd = L = 50, the distribution function is peaked 
sharply near L. Allowing for a single bubble at the centre transforms P{R) completely, 
shifting the peak to i? = 0. The peak of the distribution function moves away from R = 0, 
as the bubble position moves off-centre, with P{R) peaking near R = L as the bubble 
position shifts to the chain end. We plot P{0) as a function of the bubble position in the 
inset to Fig [21 As the bubble position moves 3 units away from the center, P(0) drops by 
one order of magnitude. This peak at L/2 becomes sharper for small L, implying that the 
principal contribution to the loop forming probability density for short chains comes from 
bubbles positioned at the chain center. 

This observation justifies a simple analytic approach to P{R) and the loop formation 
probability density: The distribution function P{R) for short DNA molecules {L <^ Ip, so 
they may be assumed to be rigid to a first approximation) with one bubble in the middle can 
be represented in terms of two infinitely rigid rods of length L/2 connected with a flexible 
hinge. In the limit where the bending energy cost at the hinge is zero, the probability 
distribution function is that of a random walk with two steps of equal size L/2. This yields 
P{R) = 1/ {2ttL'^R), a behavior similar to that obtained from the transfer matrix calculation; 
see the top curve of Fig. |21 (main panel). 

Analytic results beyond the assumption of rigid rod behaviour can be derived for the loop 
formation probability density at large fi, at which the effects of a single bubble dominate. 
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We have found that the distribution function of the end-to-end vector PfR, L, Ip) for a 



semiflexible chain of length L with a persistence length Ip in a regime relevant to the DNA 
molecules in the CW experiment, is dominated by the weighted sum of two terms. The first 
term, Pq{Ip,L), is the contribution in the absence of bubbles, while the second, Pi{lp,L), 
reflects the contribution from a single bubble placed at the centre of the chain. Thus, 

0, the loop formation probability density, is then 
1 



P(R, L, Ip) in the limit R 



P(0) 



1 + 2Le-^f' 



[Po{lp, L)+2lpe-''^P^{lp,L 



(3) 



(The factors of 2lpe~'^^ are fixed by the normalization appropriate to the calculation of the 
loop formation probability density from the spin model.) The Shimada-Yamakawa theory 
of the cyclization of semiflexible polymers provides quantitative estimates of Po(^p,-t')0 in 
the limit relevant to the Cloutier-Widom experiments. 

The term Pi{lp,L) represents the loop forming probability density of a chain of length 
L with a flexible hinge in the center. We exploit an accurate variational expression for 
the end-to-end distribution function of a semiflexible chain of length L ^|: P(R, L) = 



Cjg) 
L3 



1 



l-(i?/L)2 



9/2 



exp 



l-(i?/L)2 



with g = 9L/8lp, and obtain P(0) from an integral over the 
product of P(R,L)'s of the form / dRP(R, L/2)P(-R, L/2). This leads to Pi(/p,L) = 
^ 32ncHg/2) ^j ^^^f{x) ^ ^^leie X = 2R/L and f{x) = 2\nx - 91n(l - x^) -g/{l- x^), and 
C{g/2) is a normalization factor 13] . The integral is first approximated by saddle point 
techniques. We then multiply the resulting analytic expression by a further constant factor 
of 2.273, to match with results obtained through numerical integration. This procedure 
yields near perfect agreement with numerics in the range < L/lp < 3; a range over 
which the integral varies across 12 orders of magnitude. Further specializing the resulting 
expression to the L/lp regime explored in the CW experiments yields a single formula for 
P(0), involving and Ip-. 

1 



P(0) 



1 + 2lpe-Pf' 



112 04-te"^^-°^^^+°-^^^^ 



^■2 1^ 



-15 
,2^ — 



'1 _L 1^ _L ^^\-2p Si* ' ^ 

3L ^ 27L2 ) ^ 



(4) 



where y"^ = 1.037 — 0.16Q7L/lp. This formula is compared to the results of the transfer 
matrix calculations in Fig. ^ The agreement is satisfactory, particularly at large /i, where 
the "single bubble" approximation is expected to be accurate. 




FIG. 2: i^(R.) for homogeneous dsDNA of length L = 50 computed with a single bubble placed 
at L/2 (top curve), L/2 it 1 (middle curve) and L/2 it 3 (bottom curve). The inset shows i-'(O) 
as a function of bubble position, illustrating how this quantity peaks sharply when the bubble is 
placed at the centre. 

Variations in P{R) as a consequence of bubble formation should be reflected in exper- 
imentally measurable force extension (FE) relations. Experiments typically measure the 
average force required to maintain the two ends of the polymer at a fixed separation R. 
This average force is (/) = — d log {P{R))/dR. Since our calculations access P{R) directly, 
we can calculate this average force as a function of extension in all the cases discussed ear- 
lier. We find that the FE curves are strongly ensemble dependent for small chains - FE 
relations in the fixed force ensemble are always monotonicQ], while non-monotonic relations 
can be obtained in the fixed extension ensemble. Such non-monotonicity disappears in the 
L — s> oo limit, where FE curves calculated in both ensembles coincide, as expected. Fig. El 
shows plots for homogeneous chains setting i3J = 50 and jSfi = 6 and 9, of (/) vs. R, in 
the constant extension ensemble (plots (b) and (a), respectively), as well as in the constant 
force ensemble, in which (R) vs. f is calculated (see plots (c) and (d)), illustrating these 
conclusions. 

We have also investigated the effects of sequence heterogeneity on the loop formation 
probability density and P{R). The energy to break paired bases is strongly sequence depen- 
dent, and alters the local value of J. (It is known that A-T bonds are more easily broken 
than G-C bonds 0,Q.) Intuitively, regions of reduced bending rigidity should play a role 
similar to that played by bubbles, reducing the energy required to bend the chain at spe- 
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FIG. 3: Force extension relation for homogeneous dsDNA of length L = 50 at parameter values 
[5 J = 50 with (3^. = 6 and = 9. The curves (a) and (b) are computed in the fixed-extension 
ensemble whereas the curves (c) and (d) are computed in the fixed force ensemble. 

cific locations. We have experimented with 50:50 mixtures of bonds with strength (33 = 40 
(weak) and /3J = 60 (strong) in a system of size L = 100 and arranged in the following 
way: (a) two equal stretches of strong bonds at each end, separated by 50 weak bonds in 
the central region (b) two equal stretches of weak bonds at each end, separated by 50 strong 
bonds in the central region and (c) a typical random sequence formed by laying weak and 
strong bonds down at random, subject to the constraint that they are equal in number. We 
have checked that in the limit of large chains, the results for a typical random sequence of 
weak and strong bonds are close to those obtained from the homogeneous case with f3J = 50. 

Our results are the following: as is intuitively clear, the case in which the stretch of weak 
bonds is placed in the centre, case (a), yields the largest values for P(0) while the case in 
which strong bonds populate the centre, case(b), yields the smallest. The random chain 
result, case (c), is close to the result for the homogeneous case. The variation in P(0) spans 
a full order of magnitude or more for short chains at these parameter values, indicating the 
importance of sequence heterogeneity for loop formation. 

In conclusion, we have computed a wide range of physical properties of a simple model 
for short dsDNA molecules which incorporates the presence of bubbles in equilibrium. We 
point out that several physical properties of short dsDNA molecules, as reflected in P{R), are 
strongly affected by bubble formation. Our model is easily extended to account for sequence 
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heterogeneity. The unusual force-extension relations in diverse ensembles exhibited here 
may have implications for loop formation in vivo. 
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